Julia Lesson

By Andrew Ma and Luke Miller

What is Julia?

Julia is a high-level, high-performance dynamic programming language that looks like Ruby/Python syntax meets MatLab. It is meant to bridge the gap for mathematics and programming while also being very efficient at crunching numbers. Most of Julia's base library is written in Julia (woo metaprogramming).


In [1]:
# variable
x = 10
println(x)

# super hard math
y = x + 1
println(y)

# reassigning a variable
x = x + 1
println(x)

# unicode names
δ = 0.00001
println(δ)

안녕하세요 = "Hello"
println(안녕하세요)


10
11
11
1.0e-5
Hello

Stylistic Conventions:

  • Names of variables are in lower case.
  • Word separation can be indicated by underscores ('_'), but use of underscores is discouraged unless the name would be hard to read otherwise.
  • Names of Types and Modules begin with a capital letter and word separation is shown with upper camel case instead of underscores.
  • Names of functions and macros are in lower case, without underscores.
  • Functions that write to their arguments have names that end in !. These are sometimes called “mutating” or “in-place” functions because they are intended to produce changes in their arguments after the function is called, not just return a value.

In [2]:
# Overflow example
x = typemax(Int64)
println(x)
println(x+1)


9223372036854775807
-9223372036854775808

In [3]:
# Coefficients
x = 3
println(2x^2 - 3x + 1)
println(1.5x^2 - .5x + 1)
println(2^2x)


10
13.0
64

In [4]:
# Zero and One operators

println(zero(1.0))
println(one(0))


0.0
1

In [5]:
# Char
a = 'a'
println(a)

# String
string = "I'm a string"
println(string)


a
I'm a string

In [6]:
# Functions
function e(x,y)
    x+y
end

function e2(x,y)
    x+y, x-y
end

f(x,y) = x + y
g = f 

(x,y) = x + y

println(e(1,2))
println(e2(1,2))
println(f(1,2))
println(g(1,2))
println((1,2))


3
(3,-1)
3
3
3

In [7]:
# Functions continued
println(+(1, 2, 3))
h = +
println(h(1,2,3))

println(map(x -> x^2 + 2x - 1, [1,3,-1]))

bar(a,b,x...) = (a,b,x)
println(bar(1,2,3,4,5,6))

function optionalArg(x,y,z=0)
    x+y+z
end

println(optionalArg(1,2))
println(optionalArg(1,2,3))


6
6
[2,14,-2]
(1,2,(3,4,5,6))
3
6

In [8]:
# Scope
module A
a = 1 # a global in A's scope
end

module B
# b = a # would error as B's global scope is separate from A's
    module C
    c = 2
    end
b = C.c # can access the namespace of a nested global scope
        # through a qualified access
import A # makes module A available
d = A.a
# A.a = 2 # would error with: "ERROR: cannot assign variables in other modules"
end


Out[8]:
B

In [9]:
# Method
k(x::Number, y::Number) = 2x - y;
println(k(1,2))


0

In [10]:
# Things with types and arrays

num = 12
println(typeof(num))
println(convert(UInt8, num))

numArray = Any[1 2 3; 4 5 6]
println(typeof(numArray))
println(numArray)
convert(Array{Float64}, numArray)

# Define your own conversion
import Base.convert
convert(::Type{Bool}, x::Real) = x==0 ? false : x==1 ? true : throw(InexactError())
println(convert(Bool, 1))
println(convert(Bool, 0))


Int64
12
Array{Any,2}
Any[1 2 3; 4 5 6]
true
false

Type System and Polymorphism

Dynamic, with some of the advantages of static typings! You can add type annotations that tell the compiler what concrete type a


In [32]:
1+2


Out[32]:
3

In [11]:
(1+2)::AbstractFloat


TypeError: typeassert: expected AbstractFloat, got Int64

In [12]:
(1+2)::Int


Out[12]:
3

Julia has a nice way to call a different method based on what types are passed into it: multiple dispatch Julia determines which function to dispatch the request to at run-time.

Example function headers: function collide(me::Circle, other::Rectangle) function collide(me::Polygon, other::Circle) function collide(me::Polygon, other::Rectangle)

Then when you call collide(me, other) it dispatches it to the correct method


In [13]:
type Point
    x::Float32
    y::Float32
end

type Vector2D
    x::Float32
    y::Float32
end

type UnitVector2D
    x::Float32
    y::Float32

    UnitVector2D(v::Vector2D) = (len = norm(v); new(v.x/len, v.y/len))
end

In [14]:
#Union Types:
VecOrUnit = Union{Vector2D, UnitVector2D}
dot(u::VecOrUnit, v::VecOrUnit) = u.x*v.x + u.y*v.y


Out[14]:
dot (generic function with 1 method)


In [15]:
# Generate random 4x4 array
randomArray = rand(4,4)


Out[15]:
4×4 Array{Float64,2}:
 0.277027  0.0456322  0.21335   0.574605 
 0.205414  0.0257954  0.719418  0.284063 
 0.548858  0.891233   0.172288  0.0619572
 0.810177  0.959946   0.841337  0.317503 

In [16]:
# Broadcasting allows for the easy element-by-element binary operation on arrays
broadcast(+, randomArray, randomArray)


Out[16]:
4×4 Array{Float64,2}:
 0.554053  0.0912644  0.426701  1.14921 
 0.410828  0.0515908  1.43884   0.568127
 1.09772   1.78247    0.344576  0.123914
 1.62035   1.91989    1.68267   0.635006

Why use Julia?

Julia is fast! In the figure, the benchmarks times are relative to C, where C=1.0 You can even call C code directly if you need even more speed.


In [1]:
# Calling C code
t = ccall( (:clock, "libc"), Int32, ())
println(t)

path = ccall((:getenv, "libc"), Cstring, (Cstring,), "SHELL")
unsafe_string(path)


3665644
Out[1]:
"/bin/bash"

Julia is designed for paralellization and does not impose any style of parallelization on its users.The following example demonstrates how to count the number of heads in a large number of coin tosses in parallel.


In [18]:
nheads = @parallel (+) for i=1:100000000
  rand(Bool)
end


Out[18]:
49997452

In [19]:
@time nheads = @parallel (+) for i=1:100000000
  rand(Bool)
end


  3.035767 seconds (200.02 M allocations: 2.981 GB, 7.63% gc time)
Out[19]:
49997575

DataFrames


In [2]:
using DataFrames

In [21]:
# DataArray
dv = @data([NA, 3, 2, 5, 4])
println(mean(dv))

println(mean(dropna(dv)))

convert(Array, dropna(dv))

println(dv)

# converting na's
dv = @data([NA, 3, 2, 5, 4])
println(convert(Array, dv, 11))


NA
3.5
[NA,3,2,5,4]
[11,3,2,5,4]

In [22]:
df = DataFrame(A = 1:10, B = ["M", "F", "F", "M", "F", "M", "F", "F", "M", "M"])


Out[22]:
AB
11M
22F
33F
44M
55F
66M
77F
88F
99M
1010M

In [23]:
println(head(df))
println(tail(df))
println(df[1:3, :])


6×2 DataFrames.DataFrame
│ Row │ A │ B   │
├─────┼───┼─────┤
│ 1   │ 1 │ "M" │
│ 2   │ 2 │ "F" │
│ 3   │ 3 │ "F" │
│ 4   │ 4 │ "M" │
│ 5   │ 5 │ "F" │
│ 6   │ 6 │ "M" │
6×2 DataFrames.DataFrame
│ Row │ A  │ B   │
├─────┼────┼─────┤
│ 1   │ 5  │ "F" │
│ 2   │ 6  │ "M" │
│ 3   │ 7  │ "F" │
│ 4   │ 8  │ "F" │
│ 5   │ 9  │ "M" │
│ 6   │ 10 │ "M" │
3×2 DataFrames.DataFrame
│ Row │ A │ B   │
├─────┼───┼─────┤
│ 1   │ 1 │ "M" │
│ 2   │ 2 │ "F" │
│ 3   │ 3 │ "F" │

In [24]:
describe(df)


A
Min      1.0
1st Qu.  3.25
Median   5.5
Mean     5.5
3rd Qu.  7.75
Max      10.0
NAs      0
NA%      0.0%

B
Length  10
Type    String
NAs     0
NA%     0.0%
Unique  2


In [25]:
println(mean(df[:A]))
println(median(df[:A]))


5.5
5.5

In [26]:
df2 = DataFrame(A = 1:4, B = randn(4))
println(df2)
colwise(cumsum, df2)


4×2 DataFrames.DataFrame
│ Row │ A │ B        │
├─────┼───┼──────────┤
│ 1   │ 1 │ -1.11442 │
│ 2   │ 2 │ -2.34239 │
│ 3   │ 3 │ -0.53598 │
│ 4   │ 4 │ 1.0139   │
Out[26]:
2-element Array{Any,1}:
 DataArrays.DataArray{Int64,1}[[1,3,6,10]]                             
 DataArrays.DataArray{Float64,1}[[-1.11442,-3.45681,-3.99279,-2.97889]]

Example


In [3]:
dataframe = readtable("train.csv")
head(dataframe)


Out[3]:
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1103Braund, Mr. Owen Harrismale22.010A/5 211717.25NAS
2211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C
3313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.925NAS
4411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1C123S
5503Allen, Mr. William Henrymale35.0003734508.05NAS
6603Moran, Mr. JamesmaleNA003308778.4583NAQ


In [4]:
dataframe[:familysize] = dataframe[:SibSp] + dataframe[:Parch]
head(dataframe)


Out[4]:
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedfamilysize
1103Braund, Mr. Owen Harrismale22.010A/5 211717.25NAS1
2211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C1
3313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.925NAS0
4411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1C123S1
5503Allen, Mr. William Henrymale35.0003734508.05NAS0
6603Moran, Mr. JamesmaleNA003308778.4583NAQ0

In [5]:
dataframe[:Age] = convert(Array, dataframe[:Age], mean(dropna(dataframe[:Age])))
head(dataframe)


Out[5]:
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedfamilysize
1103Braund, Mr. Owen Harrismale22.010A/5 211717.25NAS1
2211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C1
3313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.925NAS0
4411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1C123S1
5503Allen, Mr. William Henrymale35.0003734508.05NAS0
6603Moran, Mr. Jamesmale29.69911764705882003308778.4583NAQ0

In [6]:
head(dataframe[dataframe[:Sex] .== "male", :])


Out[6]:
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedfamilysize
1103Braund, Mr. Owen Harrismale22.010A/5 211717.25NAS1
2503Allen, Mr. William Henrymale35.0003734508.05NAS0
3603Moran, Mr. Jamesmale29.69911764705882003308778.4583NAQ0
4701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S0
5803Palsson, Master. Gosta Leonardmale2.03134990921.075NAS4
61303Saundercock, Mr. William Henrymale20.000A/5. 21518.05NAS0

Accessing available public classic datasets


In [7]:
using RDatasets
iris = dataset("datasets", "iris")
head(iris)


Out[7]:
SepalLengthSepalWidthPetalLengthPetalWidthSpecies
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa
65.43.91.70.4setosa

In [ ]: